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Abstract: 

An important problem in tree breeding is optimal selection from candidate pedigree members 
to produce the highest performance in seed orchards, while conserving essential genetic diversity. 
The most beneficial members should contribute as much as possible, but such selection of orchard 
parents would reduce performance of the orchard progeny due to serious inbreeding. To avoid 
inbreeding, we should include a constraint on the numerator relationship matrix to keep a group 
coancestry under an appropriate threshold. Though an SDP (semidefinite programming) approach 
proposed by Pong-Wong and Woolliams gave an accurate optimal value, it required rather long 
computation time. 

In this paper, we propose an SOCP (second-order cone programming) approach to reduce this 
computation time. We demonstrate that the same solution is attained by the SOCP formulation, 
but requires much less time. Since a simple SOCP formulation is not much more efficient compared 
to the SDP approach, we exploit a sparsity structure of the numerator relationship matrix, and 
formulate the SOCP constraint using Henderson’s algorithm. Numerical results show that the 
proposed SOCP approach reduced computation time in a case study from 39,200 seconds under 
the SDP approach to less than 2 seconds. 

Keywords: Semidefinite programming, Second-order cone programming, Tree breeding, Optimal 
selection, Group coancestry, Relatedness, Genetic gain 

AMS classification: 90C22 Semidefinite programming, 90C25 Convex programming, 92-08 Bi¬ 
ology and other natural sciences (Computational methods). 


1 Introduction 

The usage of mathematical optimization approaches for tree breeders have been increasing PUS 
EDI E3], since one of their purposes is to derive better performance from seed orchards. When 
tree breeders make a plan for new seed orchards, they determine the contributions of candidate 
pedigree members so that the resultant orchard maximizes response to the selection. From the 
viewpoint of mathematical optimization, the simplest form of the optimal selection problem is: 

max : g T x 

subject to : e T x = 1 

l < X < u. 
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Here, we assume that the number of the candidate members is m. The variable vector is x £ M m , 
and corresponds to the contributions of the candidates. The first constraint e T x = 1 indicates 
that the total contribution of candidate members is unity. We use the vector e £ M m to denote 
the vector of all ones, and the superscript T to denote the transpose of a vector or a matrix. In 
the second constraints, l £ M m and u £ M m are element-wise lower and upper bounds on the 
contributions, respectively. The performance measure appears in the objective function as its 
coefficient g = (gi, ■ ■ ■ ,gm.) T i and the estimated breeding value (EBV) |T3j is often employed for 
g. The values g±,... ,g m are calculated separately before we solve the optimal selection problem, 
so we can consider gi, ■ ■ ■ ,g m as constant values. 

The above problem is a simple linear programming problem, therefore, a greedy method is 
enough to solve it. Such solution includes the candidates corresponding to the highest EBV as 
much as possible, and it is most efficient in situations where all the pedigrees are independent. 
Even if the pedigrees are independent, diversity is still an issue. Lindgren et al. m discussed a 
linear deployment in which the contributions of the candidate members are proportional to their 
EBVs. In practical situations, however, we cannot overlook the effects due to the relatedness that 
accumulates over cycles of breeding the pedigrees. 

To reflect the effect of the relatedness, Cockerham [0] extended the definitions of coancestry 
coefficients in order to include coancestry of a group. The group coancestry on the contributions 
x is calculated with the formula x 2 x , where A £ M mxm i s the numerator relationship matrix 
of Wright [29] (we will review a formula for the numerator relationship matrix in Section [ 2 ]). 
Introducing a constraint to keep group coancestry under an appropriate level 9 £ M, Meuwissen m 
proposed a formulation of optimal contributions: 

max : g T x 

subject to : e T x = 1 

x T Ax < q (1) 

l < X < u. 

Meuwissen developed an iterative method based on Lagrangian multipliers to solve this opti¬ 
mization problem, and his method has been used in breedings, for example, m si gi]. It is a 
characteristic of this method that some variables Xi may be fixed to its lower or upper bounds (/j 
or Ui) during the iterations, and Pong-Wong and Woolliams |20] demonstrated that the method 
did not always obtain the optimal solution. 

Pong-Wong and Woolliams utilized the structure of the numerator relationship matrix A to 
formulate the problem (jTJ) into a semidehnite programming (SDP) problem. SDP is a convex 
optimization problem that maximizes a linear objective function over the constraints described 
as linear matrix inequalities. The research in 1990s, for example lama, extended interior-point 
methods from linear programming problems to SDP problems. Based on primal-dual interior- 
point methods, software packages (like SDPA [3D], SDPARA [3T], SDPA-C [52] . SDPT3 [2D] , and 
SeDuMi [25] 1 have been developed for solving SDPs. Using the SDP formulation, Pong-Wong and 
Woolliams obtained the exact optimal value of 0. The number of candidate members discussed 
in [20], however, was limited to only small sizes, m < 10. Ahlinder et al. [|lj implemented the 
SDP approach into a software package called OPSEL [T?j^] with the help of the latest version of 
SDPA (a high-performance SDP solver) [30j- They solved large problems (m > 10,000) that were 
generated from real Scots pine pedigrees and performance data, and they also focused on flexibility 
and re-optimization of the SDP approach. 

A main obstacle in the numerical tests of [1] was that the SDP approach took rather long 
computation time. They reported for their case study that it required five-hours of computation 

J http://www.skogforsk.se/opsel 
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time for a problem of the size rn = 12,000. Even though the SDP guarantee the optimal solution, 
the computation time is rather long for operational application and requiring significant truncation 
of the candidate list prior to optimizing the selection. 

In this paper, we propose a second-order cone programming (SOCP) approach. SOCP is a 
convex optimization that maximizes a linear objective function over second-order cone constraints. 
SOCP can be considered as a special case of SDP, and can be efficiently solved with interior-point 
methods in a similar way to SDP [231 [27 ] . Lobo et al. m discussed wide-range applications of 
SOCP, for example, filter design and truss design, and Sasakawa and Tsuchiya applied SOCP to 
magnetic shield design [22]. Alizadeh and Goldfarb [2] surveyed theoretical and algorithmic aspects 
of SOCP, and the software packages SDPT3 [26] and SeDuMi [25] can solve not only SDP but also 
SOCP using the primal-dual interior-point methods. In addition, ECOS [5] was also implemented 
recently to solve SOCP problems. 

We first discuss that the proposed SOCP formulation also attains the optimal solution of (JTj). 
Since SOCP is a special case of SDP, we could expect that a simple SOCP formulation would 
be enough to reduce the computation time. However, preliminary numerical tests showed that 
the simple formulation did not perform well. We therefore utilize a sparsity embedded in the 
numerator relationship matrix and establish a more efficient SOCP formulation. Furthermore, we 
integrate Henderson’s algorithm [8] into this formulation. Numerical tests with the data including 
Scots pine showed that the SOCP formulation with Henderson’s algorithm reduced a great amount 
of computation time. For the case of m = 10,100, we attained a speedup of 20,000-times compared 
to the SDP approach. 

The rest of this paper is organized as follows. Section [2] describes the SDP approach of Pong- 
Wong and Woolliams and discusses a simple SOCP formulation. In Section 3| we propose SOCP 
formulations and derive an efficient method to solve problem (jTj) . Section [4 shows the numerical 
results to verify the computation time reduction for problems of various sizes. Finally, Section [5] 
gives conclusions and discusses future directions. 

2 SDP formulation and simple SOCP formulation 

Since a principal characteristic of our problem 0 is determined by the numerator relationship 
matrix A, we first review a formula to evaluate its elements. We then describe the SDP approach 
of Pong-Wong and Woolliams m, and compare the performance of the SDP approach and a 
simple SOCP formulation. 

To evaluate the elements of the numerator relationship matrix A , we separate the set of pedigree 
candidate members V := {1, 2,..., m} into the three disjoint groups: 

V = Vo U Vi u v 2 , 


where 

{ Vo = {i G V : both parents p(i ) and q(i ) are unknown} 

Vi = {i £ V : one parent p{i) is known and the other parent q(i) is unknown} 

V2 = {i € V : both parents p(i) and q(i) are known}. 

Figure [l] gives an example of pedigree with m = 9 members and illustrates its genealogical chart. 

In this example, Vo = {1,2}, Pi = {5},P2 = {3,4,6, 7, 8, 9}, and the parents of the 8th member 
are p{ 8) = 7 and q{ 8) = 6. We use a convention p(i) = 0 or q(i) = 0 if the parent p(i) or q(i) is 
unknown, respectively, and we can assume i > p(i) > q{i) for all i £ V without loss of generality. 
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pedigree id 

parents 

1 

unknown and unknown 

2 

unknown and unknown 

3 

1 and 2 

4 

1 and 2 

5 

2 and unknown 

6 

3 and 4 

7 

1 and 5 

8 

6 and 7 

9 

5 and 7 



Figure 1: An example of pedigree heredity and its diagram. 


The numerator relationship matrix A was defined by Wright [29], and its simplified formula 
was devised in [8]. The formula of [8] gives the elements An,. .., A nn in a recursive style: 

f Aij = Aji = for i = 1,..., m, j = 1,..., i - 1 


a _ii Aio,i(o 

-ri.%1 — J. i 2 


for i = 1,..., m, 


where we use a convention A pq = 0ifp = 0org = 0. When we apply this calculation to the 
example of Figure [l] we obtain the corresponding matrix A as follow: 
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Pong-Wong and Woolliams 
form of SDP is given by 


formulated the problem ([Tj) into an SDP problem. A standard 


max 

subject to 


E ra 

k= 1 c k z k 

- ££Li F kz k g 


(3) 


We use S n to denote the space of n x n symmetric matrices, and S” C 8 n to denote the space 
of positive semidefinite matrices of dimension n. The variables are z\,... ,z m E M, and the input 
data are ci,...,c m GR and F o, F i,..., F m E § n . 

The key step of Pong-Wong and Woolliams [2D] was the usage of the Schur complement of a 
matrix block. They noticed that the numerator relationship matrix A is always positive definite, 
and they utilized this property to convert the constraint on the group coancestry to a positive 
semidefinite condition on the symmetric matrix: 


x T Ax 


<0 ^ 


29 

x 


X 

A' 1 


G§ 


1+ra 


(4) 


With this positive semidefinite constraint, they converted (lj into the standard SDP form. The 
input vector c and the input matrices Fq , F \,..., F m in ([3 ) are given as follow mm- 


c = g, 
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Table 1: Computation time on GENCONT and SDP formulation (time in seconds) 


m (the number of pedigree) 

2,045 

10,100 

GENCONT optimal value 

438.56 

OOM* 

time 

67.43 


SDP formulation optimal value 

439.12 

47.76 

(with 4 cores) time 

70.21 

39200.78 


*OOM - “out of memory” 


( 1 


F n = 


-1 


Diag(w) 


-Diag(Z) 


26 > 0 T 

0 A ~ X J ) 


( 1 


F h = 


-1 


Diag(ei) 


-Diag(e,) 


0 -ej 


O J 


for k = 1,... ,m 


We use e* to denote the vector of all zeros except 1 at the ith element, and Diag(it) to denote 
the diagonal matrix whose diagonal elements are u. Note that the dimension of Fq,F\, ..., F m 
is 3m + 3. The software package OPSEL m automatically formulates the optimal selection into 
an SDP problem of this form. 

Table [l] shows the computed optimal values and the computation time of Meuwissen’s imple¬ 
mentation (GENCONT) [T6]J and the SDP formulation. We executed the numerical tests using 
Matlab R2015a on Windows 8.1 PC with Xeon CPU E3-1231 (3.40 GHz, 4 cores) and 8 GB mem¬ 
ory space. We used Windows, since GENCONT can run only on Windows. To solve the SDP ([3]) , 
we employed SDPA [30] , 

We observe from Table [l] that the SDP formulation attained a better optimal value than GEN¬ 
CONT. Actually, as shown in [2Dj, the optimal value of the SDP formulation was the exact optimal 
value, while the Lagrangian multiplier method implemented in GENCONT could not guarantee 
the optimality. For the large problem (m = 10,100), GENCONT gave up the computation, but 
the SDP formulation again obtained the optimal solution. On the other hand, the disadvantage 
of the SDP formulation is its computation time. Even using the parallel computing implemented 
in SDPA, the SDP formulation was slower than GENCONT for m = 2,045. Furthermore, the 
computation time for the large problem exceeded 10 hours even with four cores. When we used 
only one core, the SDP formulation would require longer computation time than 24 hours. 

To reduce the heavy computation time of the SDP formulation, we review the group coancestry 
constraint x 2 x < 6. With the positive definiteness of A, we focus on the property that this 
constraint can also be described as a second-order condition. The vector v £ M n<J is said to satisfy 
the second-order condition if v £ IC nq . The symbol IC nq denotes the second-order cone of dimension 


v £ M n « : vi > , 

n q 

l \ 

k =2 
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The second-order cone is a special case of positive semidefinite constraint. In fact, using the Schur 
complement, we can verify that 


v G K r 




/ 

V\ 

v 2 

v 3 ... 
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\ 


V2 

Vl 

0 ••• 
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v\ ■■■ 

0 
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0 ••• 

V\ 

/ 


gS 


Furthermore, since the numerator relationship matrix A is positive definite, we can apply 
the Cholesky factorization to A to obtain the upper triangular matrix U such that A = U 1 U. 
This factorization derives a second-order cone condition that corresponds to the group coancestry 
constraint: 

^-^<6 ^ x t U t Ux< 26 ^ \\Ux\\<V26 ^ ^ ^ ^ G/C 1+m . (5) 

Since the most difficult constraint in ([Tj) can be expressed using a second-order cone, it is 
natural to consider its SOCP formulation. A standard form of SOCP problem is described by 

max : c T z . . 

subject to : f 0 — Fz G x KA q . 


We use x K. nq to denote the Cartesian product of the non-negative orthant of dimension rq, 
:= {t> G M n ' : Uj > 0 for i = 1 ,...,n/}, and the second-order cone of dimension n q . In 
this problem, the variable vector is 2: G M m , and the input data are c G M m , f 0 G and 

F G M( n i +n 9) xm . A more general SOCP form than (|6j) can be defined so that it can handle the 
Cartesian product of multiple second-order cones, but one cone is enough for the discussion in this 
paper. 

If we formulate the optimal selection problem @ in a simple style, we obtain an SOCP problem: 
max : g T x 


( 1 \ 

-1 


/ 



subject to 


u 


- 1 

73 6 

V 0 / 


/ 

-1 

nr 

U 


x 


G x /C 1+m , 


(7) 


We use I to denote the identity matrix of dimension m. We call this formulation a simple SOCP 
formulation. We emphasize that this simple SOCP formulation also gives the exact optimal value 
of the optimal selection problem 0 due to the equivalence from 0 and 0; 

(* f- 0 €S i +ra « 

Since SOCP is a special case of SDP, we expected that we could solve the SOCP formulation ([T]) 
faster than the SDP formulation ([ 3 ]). In Table [2j we compare the computation times of the SDP 
and SOCP formulations. We used ECOS |2j as the SOCP solver for ([7]). For the small problem 
(m = 2,045), the SOCP formulation successfully reduced the computation time from 70.21 seconds 
to 0.28 seconds, so its speed-up was 250-times. However, for the large problem (m = 10,100), the 
speed-up was limited to 7-times. When we consider the computational complexity, the simple 
SOCP formulation would be slower than the SDP formulation for further large problems. On 
contrary to the fact that SOCP is a special case of SDP, the simple SOCP formulation did not 
seem promising. 
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Table 2: Computation time on SDP and simple SOCP formulations (time in seconds) 


m (the number of pedigree) 

2,045 

10,100 

SDP formulation (1 
simple SOCP formulation ( r 

P 

n 

70.21 

0.28 

39200.78 

5604.25 




Figure 2: The positions of non-zero elements of A (left) and A 1 (right) for the problem of size 

m = 10 , 100 . 


3 Efficient formulation based on SOCP 

To obtain an efficient formulation based on SOCP, we investigated key properties of the simple 
SOCP formulation Q. In particular, we focused on the structure of the numerator relationship 
matrix A and its inverse A -1 , since the SDP formulation uses only A -1 . Figure [ 2 ] illustrates 
the positions of the non-zero elements of A and A -1 for the problem of size m = 10,100. The 
dimensions of A and A' 1 corresponds to the size m. We can see from Figure [ 2 ] that A -1 is much 
sparser than A. The numbers of non-zero elements in A and A -1 are 56,754,980 and 56,092, 
respectively, hence their density against the fully-dense matrix (m 2 non-zero elements) are 55.6% 
and 0.0549%, respectively. When we apply the Cholesky factorization to A, the upper-triangular 
matrix U inherits the dense property. The number of non-zero elements in U is 23,171,296, and its 
density against the fully-dense upper triangular matrix is 45.4%. We should utilize the property 
that A~ x is remarkably sparse compared to A and U. 

From this observation, the direction we should pursue is to use A -1 instead of A itself. A key 
step of our approach is to introduce the new variable y = Ax. Then, the replacement x by A l y 
in the optimal selection |l]) leads to an equivalent optimization problem: 

max : ( A~ 1 g) T y 

subject to : (A~ 1 e) T y = 1 

V TA 2 ly < 6 

l < A x y < u. 

The Cholesky factor of A -1 is the transposed matrix of [7 _1 , denoted by U~ T : namely, A -1 = 
(U~ T ) T U ~ 1 . In practical implementation, since A -1 is remarkably sparse, we apply an appropri¬ 
ate row/column permutation like AMD (approximate minimum degree permutation) |3j to A -1 
in order to reduce the number of fill-in (the non-zero elements that newly appear in U~ T during 
the process of the Cholesky factorization). Using U~ T , we convert this problem into an SOCP 
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Table 3: Performance comparison on the the SDP formulation, the simple SOCP formulation, and 
the sparse SOCP formulation (time in seconds). 


m (size of pedigree) = 2,045 



nnz time (conversion) time (solver) time (total) 

SDP formulation ( 
simple SOCP formulation ( 
sparse SOCP formulation ( 

3 

7 

8 

) 

) 

) 

24300 0.52 69.55 70.21 

18201 0.10 0.04 0.28 

30348 0.37 0.05 0.55 


m (size of pedigree) = 10,100 



nnz time (conversion) time (solver) time (total) 

SDP formulation ( 
simple SOCP formulation ( 
sparse SOCP formulation ( 

3 

7 

8 

) 

) 

) 

121703 26.97 39173.03 39200.78 

23231801 15.95 5587.53 5604.25 

159570 24.14 0.68 25.60 


problem: 


max 


subject to 


( 1 \ 

-1 
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- 1 

V20 

V 0 / 


(■ A- l g) T y 

( (A- 1 c) r \ 

' -{A^ef 
A' 1 

-a - 1 y 


V 


0 

t-T 


u - 1 / 


G M+ +2m x K 1+m 


( 8 ) 


We call this formulation a sparse SOCP formulation of the optimal selection problem ([I]). 

We use the matrix A to formulate this new SOCP formulation, but we do not have to use 
A to solve the resultant SOCP problem with the interior-point methods. Furthermore, when we 
obtain the optimal solution y* of (j8|, we can also obtain the optimal solution x* of the original 
problem |l| via the relation x* = A l y* without using the dense matrix A. 

In Table [3j we compare the performance of the SDP formulation ([ 3 ]), the simple SOCP formu¬ 
lation Q, and the sparse SOCP formulation ([8]). In this table, the first column ‘nnz’ is the total 
number of non-zero elements of Fq,F\, .... F m for SDP and that of f 0 and F for SOCP. The 
second column is the computation time to convert the pedigree like Figure [I] and EBVs (estimated 
breeding values) into the SDP or SOCP formulations, and the third column is the computation 
time of the solvers. We applied SDPA with four cores to the SDP formulation and ECOS to the 
SOCP formulations. The fourth (last) column is the total computation time. 

Table[3]shows that for the large problem, the computation time of the sparse SOCP formulation 
was much shorter than that of the simple SOCP formulation. The sparse SOCP formulation seemed 
to have more complex structure than the simple SOCP formulation since it repeatedly contained 
A -1 in the matrix F. but the total number of non-zeros was reduced from 23,231,801 in the simple 
SOCP formulation to 159,570 in the sparse SOCP formulation. This led the computation time 
reduction for the SOCP solver. 

To reduce the total time further, we now address the computation time to build the SOCP 
formulation. In the case of m = 10,100, the conversion time occupied 94 % of the total time. In 
particular, the construction of the dense matrix A and its inversion are the principal bottlenecks. 

We investigate further the properties of A^ 1 , and employ a compact algorithm to construct 
A -1 proposed by Henderson [8], In the compact algorithm, we use the vector of inbreeding 





















Table 4: A compact algorithm to obtain the inverse of the numerator relationship matrix 
A” 1 <- O. 

, m 


for i = 1,2, 


1 {l+S{p(.i)){l-h p (i))+(l+5(q(i))(l-h q(i) ) 

if i e Vo then 

add bi to A^ 1 
elseif * G V\ then 
add bi to A^ 1 

add to Arj (i) , and A"^ 

add | to A^ )p( . } 

elseif i £ V 2 then 
add bi to A^ 1 
add — b to At\. 

2 z,p(z) 

add b to A~b r , 

4 p(i),p (?) 


endif 

endfor 


^p(i),V an< ^ 

1 AkO.pM’ an< ^ ^3(*)>?(*) 


coefficients defined by h := diag(A) — e, where diag(A) is a vector composed of the diagonal 
elements of A. Quaas [2Tj devised an efficient method to compute the inbreeding coefficients h 
without constructing the matrix A itself, and Masuda et al. utilized this method to implement 
their YAMS package |14| . The compact algorithm in [Sj with the enhancement [2T] is summarized 
in Table |4j We use an indicator function 5(p) such that 5(0) = 1, and 5{p) = 0 for p / 0. We also 
use the notation A” 1 to denote the (i,j) element of A -1 . 

The compact formula for Table [4] can be described as 



When we apply this formula to the example in Figure [lj we obtain the inverse of the numerator 
relationship matrix as follow: 
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A - 1 = — 
42 


V 

We can see in this example that A -1 has more zero-elements than A of Q. 

It is known that bi > 0 for any i = 1,2,... ,m from properties of the inbreeding coefficients. 
Therefore, the constraint on the group coancestry can be transformed into another second-order 
cone constraint: 


x T Ax 
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< e 
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y T A l y 
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G /C 1+m . 

Here, S is the matrix whose ith row vector is defined by 


Bi* — < 


VheJ for i g Vo 

Vbi (■ e-i ~ le p (i)) for i G V\ 

Vbl ( e i — \e p{ i) - for i G V 2 . 


We now replace the matrix U T in the sparse SOCP formulation 
another SOCP formulation: 


max 


{A~ l g) T y 


subject to : 


( 1 ^ 


( (A^e) T \ 

-1 


-( A~ l e) T 

u 


A" 1 

-l 


-A” 1 

V2 e 


0 

V 0 ) 


V B ) 


(9) 


by B, hence, we derive 


y e 


p2+2 m 


ic 1+m , 


( 10 ) 


We call this formulation a compact SOCP formulation. 

The combination of the compact algorithm in Table [4] and the second-order cone constraint ([ 9 ]) 
gives us a formulation that does not rely on any dense matrices. Table [5] adds the results of 
the compact SOCP formulation to Table [3j From Table [5j we observe that the solver time for 
the compact SOCP formulation was slightly shorter than the sparse SOCP formulation. A further 
computation time reduction was obtained in the computation time to build the SOCP formulations 
from the pedigree. In the case m = 10,100, the conversion was reduced from 24.14 seconds to 0.37 
seconds. Furthermore, we saved a lot of memory space. For the case m = 10,100, the matrix A 
required 778 MB of memory, while in contrast the memory required for B is only 2.33 MB. 


4 Numerical tests 

We conducted a numerical evaluation of the SDP and SOCP formulations on several datasets. 
The datasets are for problems of sizes 2045, 5050, 15100, 15222, 50100, 100100 and 300100. The 
data with the sizes 2,045 and 15,222 were from Scots pine orchards and loblolly pine orchards, 
respectively, and these data are available at the Dryad Digital Repository http://dx.doi.org/ 
10.5061/dryad. 9pn5m. The other data were generation by simulation of five cycles of breeding in 
a closed population using the approach of mm- 

Although we used Matlab for the comparison between the formulations, we also implemented 
the compact SOCP formulation with C+-K There were two reasons to implement it outside a 
Matlab environment. The first is that we can directly know the structure of non-zero elements 
that appear in the matrix B. Therefore, a specified data structure can accelerate the computation 
time to arrange the input data for building F. Secondly, we expect a software package that does 
not depend on commercial software would extend the opportunity for the field application in tree 
breeding. Due to the latter reason, we also excluded commercial SOCP solvers from the numerical 
tests. 
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Table 5: Performance comparison on the the SDP formulation and the SOCP formulations (time 
in seconds). 


m (size of pedigree) = 2,045 



nnz time (conversion) time (solver) time (total) 

SDP formulation 
simple SOCP formulation 
sparse SOCP formulation 
compact SOCP formulation ('. 

3 

7 

8 

L0 

) 

) 

) 

) 

24300 0.52 69.55 70.21 

18201 0.10 0.04 0.28 

30348 0.37 0.05 0.55 

30249 0.01 0.05 0.19 


m (size of pedigree) = 10,100 



nnz time (conversion) time (solver) time (total) 

SDP formulation 
simple SOCP formulation 
sparse SOCP formulation 
compact SOCP formulation ( 

3 

7 

8 

L0 

) 

) 

) 

) 

121703 26.97 39173.03 39200.78 

23231801 15.95 5587.53 5604.25 

159570 24.14 0.68 25.60 

153001 0.37 0.62 1.76 


One of the advantages of the compact SOCP formulation is that we do not need numerical rou¬ 
tines for the Cholesky factorization. If we implement the simple or sparse SOCP formulation with 
CTT, we have to embed certain numerical routines for the Cholesky factorization. In addition, 
the sparse Cholesky factorization requires a preprocessing by AMD [3j to derive its best perfor¬ 
mance. In contrast, the compact SOCP formulation obtains the matrix directly F as discussed in 
Section [3j 

The computation environment for the small problems (m < 10,100) was Matlab R2015a on a 
Windows 8.1 PC with Xeon E3-1231 (3.40 GHz) and 8 GB memory space. For the large problems 
(m > 15,100), we used Matlab R2014b on a Debian Linux server with Opteron 4386 (3.10 GHz) 
and 128 GB memory space, since 8 GB memory space was not enough for the SDP formulation. 

Table [6] shows the numerical results of the SDP and SOCP formulations. We observe from 
this table that the SDP formulation demanded rather long computation time, particularly for the 
larger problems. For the problem m = 15, 222, the compact SOCP formulation with C++ reduced 
the 22, 566 seconds of the SDP formulation to only 2.21 seconds. 

Another significant advantage of the compact formulation is memory consumption. The SDP 
formulation consumed 31 GB memory space to solve the problem m = 15,222, and it failed to 
handle m > 50,100, despite having 128 GB of memory available. If we use a rough estimation, the 
memory required for the largest problem m = 300,100 would be 12,000 GB. The simple SOCP 
formulation also suffered from a heavy memory requirement to store the dense matrix U. The 
sparse SOCP formulation did not require the dense matrix A in the resultant SOCP problem, but 
it utilized A and computed A -1 , hence it required the long conversion time in the same way as 
the SDP formulation. In contrast, the compact SOCP formulation consumed less than 766 MB 
memory space to solve even the largest problem m = 300,100. This memory reduction was mainly 
a result of employing Henderson’s algorithm. 

From the numerical results, we also observe that the compact SOCP formulation with C++ is 
faster than that with Matlab. The discrepancy between Matlab (99.65 seconds) and C++ (82.41 
seconds) in the largest problem was due to a specified data structure written in C++. In particular, 
the structure was effective when we arranged the pedigree before building F. 
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Table 6: Numerical results on SDP and SOCP formulations (time in seconds). 


m 

size of pedigree) = 2,045 


nnz time (conversion) time (solver) time (total) 

SDP formulation 
simple SOCP formulation 
sparse SOCP formulation 
compact SOCP formulation ( 
compact SOCP formulation with C- 

3 

7 

* 

0 

-4 

) 

24300 0.52 69.55 70.21 

18201 0.10 0.04 0.28 

30348 0.37 0.05 0.55 

30249 0.01 0.05 0.20 

28246 0.01 0.06 0.09 

to 

size of pedigree) = 5,050 


nnz time (conversion) time (solver) time (total) 

SDP formulation 
simple SOCP formulation 
sparse SOCP formulation 
compact SOCP formulation (] 
compact SOCP formulation with C- 

3 

7 

s 

TJ 

-4 

) 

) 

) 

60853 4.63 887.32 892.30 

6812127 1.60 696.10 698.15 

78405 3.67 0.19 4.21 

76533 0.04 0.19 0.58 

76533 0.01 0.21 0.28 

m (size of pedigree) = 15,100 


nnz time (conversion) time (solver) time (total) 

SDP formulation 
simple SOCP formulation 
sparse SOCP formulation 
compact SOCP formulation ( 
compact SOCP formulation with C- 

3 

7 

4 

75 

-4 

) 

) 

) 

) 

181703 157.41 21836.63 21994.87 

54063065 26.17 38733.01 38760.00 

234760 145.53 2.13 148.49 

227989 0.04 2.06 2.92 

227989 0.02 1.95 1.99 

to (size of pedigree) = 15,222 


nnz time (conversion) time (solver) time (total) 

SDP formulation 
simple SOCP formulation 
sparse SOCP formulation 
compact SOCP formulation (] 
compact SOCP formulation with C- 

3 

7 

4 

0 

44 

) 

) 

) 

) 

181947 161.99 22403.30 22566.11 

7889551 17.96 618.18 636.95 

227758 150.07 2.13 153.01 

227203 0.04 2.20 3.05 

227203 0.02 2.16 2.21 

to (size of pedigree) = 50,100 


nnz time (conversion) time (solver) time (total) 

SDP formulation 
simple SOCP formulation 
sparse SOCP formulation 
compact SOCP formulation ( 
compact SOCP formulation with C- 

3 

7 

4 

0 

-4 

) 

1 

) 

OOM* 

OOM 

759294 4989.55 7.58 4999.90 

753023 0.15 7.71 10.63 

753023 0.08 7.48 7.69 

to (size of pedigrees) = 100,100 


nnz time (conversion) time (solver) time (total) 

SDP formulation 
simple SOCP formulation 
sparse SOCP formulation 
compact SOCP formulation ( 
compact SOCP formulation with C- 

3 

7 

4 

75 

44 

) 

OOM 
OOM 
> 24 hours** 

1502983 0.35 18.44 24.76 

1502983 0.16 17.57 17.92 

to (size of pedigrees) = 300,100 


nnz time (conversion) time (solver) time (total) 

SDP formulation 
simple SOCP formulation 
sparse SOCP formulation 
compact SOCP formulation ( 
compact SOCP formulation with C- 

3 

7 

4 

0 

-4 

) 

OOM 

OOM 

OOM 

4503065 1.10 82.41 99.65 

4503065 0.51 78.54 79.62 


* OOM = “out of memory” 

* > 24 hours = “the computation failed to complete with in in 24 hours.” 
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5 Conclusions and future directions 


We examined the SOCP formulations for the optimal selection problem arising from tree breed¬ 
ing. We employed the transformation x = A l y based on the sparsity of AY 1 and the efficient 
method to build A _1 by the Henderson’s algorithm with the Quaas enhancement. The compact 
SOCP formulation thus did not involve any dense matrix or the Cholesky factorization. The nu¬ 
merical results demonstrated that the compact SOCP formulation obtained the optimal solution 
significantly faster than the existing SDP formulation. 

The SOCP formulations proposed in this paper may look rather simple for researchers in the 
held of mathematical optimization. However, the computation time reduction in the optimal 
selection problem will help improve operational application in tree breeding. We expect that 
this paper will be one of bridges to introduce efficient approaches cultivated in mathematical 
optimization to tree breeders. 

In this paper, we discussed an unequal deployment of parental genotypes to seed orchards, 
where the contributions of selected members are not required to be equal. To deal with equal 
deployment, as might be appropriate for selection of a fixed-size breeding population, we will 
need a method to solve a mixed integer SOCP problem; that is an SOCP problem in which some 
variables are constrained to be integers. The structure of the SOCP formulation developed in this 
paper will be a basis for an efficient method to consider the mixed integer SOCP problem in the 
optimal selection problems. 
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